AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution

نویسندگان

چکیده

Deep learning frameworks optimize the computation graphs and intra-operator computations to boost inference performance on GPUs, while inter-operator parallelism is usually ignored. In this paper, a unified framework, AutoGraph, proposed obtain highly optimized in favor of parallel executions GPU kernels. A novel dynamic programming algorithm, combined with backtracking search, adopted explore optimal graph optimization solution, fast estimation from mixed critical path cost. Accurate runtime information based Multi-Stream launched CUDA Graph utilized determine convergence optimization. Experimental results demonstrate that our method achieves up 3.47x speedup over existing methods. Moreover, AutoGraph outperforms state-of-the-art kernel launch by 1.26x.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel computation framework for optimizing trailer routes in bulk transportation

We consider a rich tanker trailer routing problem with stochastic transit times for chemicals and liquid bulk orders. A typical route of the tanker trailer comprises of sourcing a cleaned and prepped trailer from a pre-wash location, pickup and delivery of chemical orders, cleaning the tanker trailer at a post-wash location after order delivery and prepping for the next order. Unlike traditiona...

متن کامل

Proposal 1 Optimizing parallel iterative graph computation

I propose to develop a deterministic parallel framework for performing iterative computation on a graph which schedules work on vertices based upon a valid coloring. Preliminary work modifying Graphlab, a parallel framework for implementing iterative machine learning algorithms on data graphs, has demonstrated the merits of this approach by improving performance and eliminating non-determinism....

متن کامل

Bipartite Graph Matching Computation on GPU

The Bipartite Graph Matching Problem is a well studied topic in Graph Theory. Such matching relates pairs of nodes from two distinct sets by selecting a subset of the graph edges connecting them. Each edge selected has no common node as its end points to any other edge within the subset. When the considered graph has huge sets of nodes and edges the sequential approaches are impractical, specia...

متن کامل

Multi-level Parallel Query Execution Framework for CPU and GPU

Recent developments have shown that classic database query execution techniques, such as the iterator model, are no longer optimal to leverage the features of modern hardware architectures. This is especially true for massive parallel architectures, such as many-core processors and GPUs. Here, the processing of single tuples in one step is not enough work to utilize the hardware resources and t...

متن کامل

Identifying Optimization Opportunities within Kernel Execution in GPU Architectures

Tuning codes for GPGPU architectures is challenging because few performance tools can pinpoint the exact causes of execution bottlenecks. While profiling applications can reveal execution behavior with a particular architecture, the abundance of collected information can also overwhelm the user. Moreover, performance counters provide cumulative values but does not attribute events to code regio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i9.26343